Decoding information about cognitive health from the brainwaves of sleep

Sleep electroencephalogram (EEG) signals likely encode brain health information that may identify individuals at high risk for age-related brain diseases. Here, we evaluate the correlation of a previously proposed brain age biomarker, the “brain age index” (BAI), with cognitive test scores and use machine learning to develop and validate a series of new sleep EEG-based indices, termed “sleep cognitive indices” (SCIs), that are directly optimized to correlate with specific cognitive scores. Three overarching cognitive processes were examined: total, fluid (a measure of cognitive processes involved in reasoning-based problem solving and susceptible to aging and neuropathology), and crystallized cognition (a measure of cognitive processes involved in applying acquired knowledge toward problem-solving). We show that SCI decoded information about total cognition (Pearson’s r = 0.37) and fluid cognition (Pearson’s r = 0.56), while BAI correlated only with crystallized cognition (Pearson’s r = − 0.25). Overall, these sleep EEG-derived biomarkers may provide accessible and clinically meaningful indicators of neurocognitive health.

One recently proposed sleep EEG-based biomarker of brain health is "brain age" (BA) 14 . Its difference from chronological age, called the "brain age index" (BAI), characterizes the extent to which an individual's observed neurophysiologic functioning during sleep deviates from what would be expected for their chronological age. In excess, BAI has been linked to higher mortality 15 and an underlying burden of disease, including dementia 16 , HIV infection 17 , hypertension 14 , and diabetes 14 . Although BAI gives insight into the general functional capacity of the brain, it is not explicitly designed to decode information about neuroanatomic integrity and its relationship with cognition has not yet been evaluated.
Here, our aim was to take a novel approach to measuring brain health by developing methods to decode neurocognitive information from sleep. Specifically, we developed a series of novel markers of brain health termed Sleep Cognitive Indices (SCIs). Unlike BAI, the SCIs are explicitly designed to correlate with specific components of cognition. Such indicators of brain health could be important for identifying age-related brain diseases which preferentially affect specific aspects of cognition, or for tracking the effects of interventions targeted at specific cognitive domains. We hypothesized that specific combinations of sleep-EEG features would be correlated with performance on specific cognitive tasks and that it may thus be possible to develop EEG-based indicators specifically correlated with different types of cognitive abilities. In comparison, we expected participants with elevated BAI to perform worse on cognitive assessments but reasoned that this correlation is likely nonspecific since BAI was developed to predict age.

Methods
Design and participants. We conducted a single-center, cross-sectional observational study consisting of adults (≥ 18 years of age) who underwent diagnostic polysomnography (PSG) between November 2018 and October 2019 at the Massachusetts General Hospital Sleep Laboratory. Enrolled participants completed a cognitive test battery within 40 days of their PSG. Patients were excluded if they had a baseline diagnosis of dementia or a learning disability, were unable to perform the cognitive tests due to a lack of English proficiency or impairment (motor, visual, or hearing), or if they had prior experience with the cognitive test battery. This study of human subjects was approved by the Mass General Brigham Institutional Review Board. All methods were performed in accordance with the study protocol and the Declaration of Helsinki. Written informed consent was provided by all participants. The number of subjects and their characteristics are summarized in Table 1.
The American Academy of Sleep Medicine (AASM) provides guidelines for classifying consecutive 30-s epochs of EEG signals into 5 "stages" 19 , including awake (W), rapid eye movement (REM) sleep, and 3 stages of non-REM sleep (N1, N2, N3). EEG epochs were classified following these AASM guidelines by licensed sleep technicians and the assigned stages were subsequently reviewed and revised as needed by a sleep physician. Only central electrode signals (C3-M2 and C4-M1) were used for our main analysis, as the public sleep dataset that we used for external validation, the Sleep Heart Health Study included only central electrodes. We further explored model performance when either occipital or frontal electrodes were available for analysis in addition to central electrodes.
Spindle and slow oscillation characterization. Sleep spindle and slow oscillation features were obtained using Luna software 9 (http:// zzz. bwh. harva rd. edu/ luna/). Spindle detections were included only for epochs scored as N2 and N3. A single electrocardiogram (ECG) electrode was zero-phase band-pass filtered from 0.3 to 40 Hz and used to apply ECG-correction to remove ECG artifacts from the EEG signals. Slow oscillations were detected by band-pass filtering between 0.2 and 4.5 Hz. Positive-to-negative zero-crossings were then detected in the filtered signal, and intervals between 0.8 and 2-s were designated as slow oscillations if they had a negative peak higher than the median across all zero-crossings and a peak-to-peak amplitude higher than the median. All spindle and slow oscillation features used for analysis are summarized in Table 2.
Cognitive test battery. All participants were asked to complete the NIH Toolbox Cognition Battery 20 . The NIH Toolbox Cognition Battery is one of the core domains in the NIH Toolbox for Assessment of Neurological and Behavioral Function. It consists of seven instruments that assess the following functional constructs: Flanker Inhibitory Control and Visual Attention (ICA), Dimensional Change Card Sort (DCCS; measures cognitive flexibility), List Sorting Working Memory (LSWM), Picture Sequence Memory (PSM; measures visual episodic memory), Pattern Comparison Processing Speed (PCPS), Picture Vocabulary (PV; measures vocabulary comprehension), and Oral Reading Recognition (ORR; measures reading decoding). Of these seven instruments, PV and ORR are classified as measures of crystallized cognition and the rest as measures of fluid cognition. Fluid cognition reflects a collection of cognitive processes involved in problem-solving, abstract thinking, and reasoning that are independent of past knowledge acquired through experience and education. In contrast, www.nature.com/scientificreports/ crystallized cognition represents a group of cognitive processes that apply prior knowledge from experience and education to solve problems. Although different, these two cognition types are tightly correlated components of total cognition. Despite this association, studies often examine these subdivisions of total cognition separately to better understand and treat neurologic conditions 21,22 . Additionally, both crystallized and fluid cognition have been shown to change with age 23,24 . For detailed instrument information, see Table S1. In addition to scores for individual tests, three composite scores for fluid, crystallized, and total cognition are provided. Absolute scores for each of the seven tests and the three composite scores were used for analyses (all non-age adjusted).

Statistical analyses.
Developing the sleep cognitive indices. To develop SCI for specific cognitive measures, we created a series of regression models. The dependent variable of each model was a task's absolute scores. Independent variables in SCI models included EEG features that were derived from spindles and slow waves, as well as the features in Table 2. Demographic variables, such as age and sex, were not included since our primary aim was to develop EEG-based indicators of neurocognitive health and evaluate how well brain signals alone could capture neurocognitive status, rather than produce accurate predictions of cognitive performance per se. EEG-based models were evaluated with a goodness of fit test (see below) in comparison to a full model with www.nature.com/scientificreports/ demographic variables. Because the number of independent variables (160 × 3 + 42 + 10 = 532 with all electrodes, 160 + 42 + 10 = 212 with one electrode set, Table 2) exceeded the number of participants (150) in our dataset, we used linear regression with Elastic Net regularization to prevent overfitting and to force regression models to select only the features most relevant to the target task. Note that Elastic Net regularization automatically selects which features to retain in the model, and thus the number of features selected varies depending on the specific prediction task and data used to develop the model. To avoid overestimation of regression performance, model training and feature selection were restricted to training data, while model performance was evaluated strictly on held-out test data. In summary, each SCI model is generated by extracting EEG measures of interest (determined by Elastic Net regularization), multiplying these EEG measures by the regression coefficients of the model of interest (fluid, crystallized, total), and adding the results to obtain a single number (SCI score). For SCI model optimization and testing model performance, we used nested tenfold cross-validation (CV) (Fig. S1). For each functional construct and cognitive composite score, the outer CV loop separated data into ten folds, where each fold contained 15 distinct participants. Nine folds were used for model fitting (n = 135) and the other fold for model testing (n = 15). This was done ten times, such that testing was performed once on each fold. During model fitting, Elastic Net regression was performed to select the best subset of features and their coefficients. Strict separation of training and test set was maintained to achieve statistically unbiased estimates of out-of-sample performance. Our reported performance results are based on test data only.
In addition to the new SCI models, we calculated the Brain Age Index (BAI) using a previously described machine learning model 14 . BAI includes features from the waveform time domain (e.g. line length and kurtosis which reflect EEG signals complexity) and from the frequency domain (e.g. spectral power of the delta (0.5-4 Hz), theta (4-8 Hz), alpha (8-12 Hz) bands, and their ratios 14,25 ). All features are summarized in Table 2. Features of missing sleep stages were imputed using the K-nearest neighbor approach (K = 10). We used Pearson correlations to measure the degree to which the BAI correlates with cognitive test scores. Statistical significance was defined using a p-value < 0.05.
Pearson correlation was calculated between cognitive scores and the various SCI and BAI. To compare pairs of correlation coefficients (e.g. to evaluate the difference between the strength of correlations of BAI vs. SCI with each cognitive test), we completed Fisher r-to-z transformations for each pair of correlation coefficients 26 .
To evaluate how well SCI and BAI distinguish individuals who score low versus high on different cognitive tests, we divided participants into three groups of equal size for each cognitive test (1/3 low score, 1/3 medium score, 1/3 high score). We then performed group-level analysis of discriminability using Cuzick's non-parametric test for trend to examine the statistical significance of the difference in SCI across the different score groups. For individual-level analysis of discriminability, we calculated Receiver Operating Characteristic (ROC) curves and the Area Under the ROC Curve (AUC) for each SCI model. When performing ROC analysis, the medium score group was excluded from this analysis to ensure distinctness between groups.
Evaluating cognitive variation related to age, sex, and education. Performance on cognitive tasks in the NIH Toolbox Cognition Battery depends on age 20 . We reasoned that, if our SCI indicators are valid, they should account for age-related variation in cognitive performance. If so, regression models that include age and SCI should explain no more of the variance in cognitive test performance than regression models that include SCI alone. Similar reasoning applies to other biological variables that might correlate with cognitive performance, including years of education and sex. To address these questions, we created a series of nested regression models and compared each submodel using a likelihood ratio test. Specifically, we first fitted two Elastic Net models for each cognitive test: (1) a submodel with EEG features alone (SCI model), and (2) a full model with EEG features, age, years of education, and sex. We then compared models by calculating the log-likelihood of each model and performing a likelihood ratio test to measure the change in deviance for the submodel.
External validation. External validation was performed using EEG data from the Sleep Heart Health Study [27][28][29][30] (SHHS), a composite cohort overlapping with the Framingham Heart Study 31 (FHS). Participants were included if they completed a neuropsychological test battery 32 in the FHS within 3 years of their SHHS polysomnography exam date. Scores from the following tests were used for the Wechsler Memory Scale (WMS) score calculation: Logical memory-Immediate Recall, Delayed Recall, Recognition; Visual reproductions-Immediate recall, Delayed Recall, Recognition; Paired Associate Learning-Immediate Recall, Delayed Recall. Of the 476 participants in the validation dataset; 152 were subsequently excluded due to incomplete WMS data, with the remaining 324 available for analysis. WMS does not include tests that are directly comparable with the three NIH toolbox composite scores (total, fluid, and crystallized); therefore we correlated the WMS score with all three composite SCI models (total, fluid, crystallized), with the expectation that these constructs are correlated and thus, if the SCI models capture valid physiologic information related to brain health, they should exhibit some measurable (if nonspecific) correlation with WMS scores.
A subset of participants in the FHS cohort was flagged for possible dementia using criteria as previously described [33][34][35][36] . Through the consensus diagnosis process, some of these participants were assigned a Clinical Dementia Rating (CDR)-like dementia severity rating of 0.5 and associated with the diagnosis of cognitive impairment no dementia (CIND). We further evaluated the SCI models by calculating the association between the subset of cases diagnosed as CIND and the SCI model outputs.
Statistical significance was defined using a p-value < 0.05. All statistical analyses were performed with code written in-house using Python (https:// www. python. org/). We did not perform corrections for multiple comparisons, as our aim was to measure the correlation of each SCI with its corresponding target cognitive domain rather than to draw a general conclusion about the presence or absence of an association between sleep and cognition; that is, the primary focus of the study was to estimate effect sizes rather than statistical hypothesis testing.

Results
Overall, 168 participants were enrolled; 18 were subsequently excluded from analysis as they were determined to be ineligible or had missing or incomplete data. A flowchart illustrating the screening and enrollment of study participants is shown in Fig. 1 Correlations of cognitive scores with sleep cognitive indices. Figure  We also show the correlation matrix when using the different SCI models to predict each cognitive score (Fig. S2). SCI indicators were normally distributed for all significant SCI models (Fig. S3). The top five features for significant SCI models are listed in Table S3. When evaluating the effect of EEG electrode location, we observed similar performance for the three composite cognition SCI models across different subsets of EEG electrodes ( Table 3).
Examination of BAI showed that among the three major cognition domains, only crystallized cognition exhibited a significant correlation, which was negative (Crystallized: r = − 0. 25 Figure 4 shows a scatter plot and linear fit between BAI and each cognitive test. The opposite signs of the correlations between BAI and PCPS (positive correlation) versus crystallized cognition (negative correlation) scores likely reflect the differential age-related changes observed in fluid and crystallized cognition: fluid cognition tends to decline with age and crystallized cognition likely increases to compensate 24 . www.nature.com/scientificreports/ Evaluating cognitive variation related to age, sex, and education. Likelihood ratio tests confirmed that SCI indicators for the three cognition measures and the Flanker ICA, LSWM, PCPS, and PSM cognitive tasks fit the data similarly to a full "brain health" model that incorporated EEG, age, education, and sex features (p = 0.1). Therefore, SCI models adequately capture variation in cognitive performance related to these factors. Detailed metrics for all models are listed in Table S5.   www.nature.com/scientificreports/ fluid cognition SCI indicators showed similar correlations with the participants' total WMS scores (total: r = 0.31, p < 0.0001; fluid: r = 0.32, p < 0.0001). In contrast, the crystallized cognition SCI model was poorly indicative of participants' total WMS scores (r = 0.07, p = 0.23). Correlations between the three SCI models and cognitive scores, along with score distributions are shown in Fig. 5. No significant change in the strength of association was observed when cognitively impaired participants were excluded from analysis (total: r = 0.30, p < 0.0001; fluid: r = 0.30, p < 0.0001; crystallized: r = 0.06, p = 0.28). Baseline characteristics of patients are listed in Table 4.

Discussion
In this cross-sectional observational study, we demonstrate that machine learning analyses of sleep EEG signals can generate indices that correlate with specific tests of cognition. These novel sleep EEG-derived machine learning models-the SCIs developed in the present study-were optimized to serve as indicators of brain health related to each cognitive task. They achieved a weak to moderate correlation with total cognition, moderate correlation with a composite measure of fluid cognition, and a range of weak to moderate correlations for fluid cognition subtests. SCIs for crystallized cognition and tasks were not correlated with composite crystallized cognition and subtest scores. Crucially, all significant SCI models performed well at differentiating low from high test scorers at the group and individual levels. Overall, our results suggest that overnight sleep EEG is a promising source of indicators of neurocognitive health. This is significant because sleep EEG is increasingly easy to monitor using home devices. Thus, SCIs may have promise for identifying signs of age-related brain diseases that preferentially affect specific aspects of cognitive health and for tracking the physiologic effects of interventions.

SCI versus BAI.
Comparing BAI and SCI performance, we found SCIs exhibited stronger correlations with cognitive scores for total cognition, fluid cognition, and three fluid functional constructs: working memory, episodic memory, and cognitive flexibility. Because fluid cognition often declines at earlier stages of the Alzheimer's Disease (AD) pathologic cascade 24,37 , measures of fluid cognition may serve as sensitive indicators of preclinical AD and increased vulnerability for cognitive decline in cognitively unimpaired adults. In contrast, the previously published BAI was correlated with crystallized composite cognition and subtest scores and with the visual processing speed subtest of fluid cognition. No correlation was found between BAI www.nature.com/scientificreports/ and total cognition, fluid cognition, or the remaining four fluid subtests. Although we anticipated SCI to show stronger associations with cognition, the lack of an association between BAI and fluid cognition was unexpected.
On further examination, we found that while crystallized cognition and chronological age (CA) were positively correlated (r = 0.37, p = 0.001), BA was negatively correlated with crystallized cognition in our cohort (r = − 0.16, p = 0.049). BAI (i.e., BA-CA) was therefore negatively correlated with crystallized cognition. This was not the case for fluid cognition. Because both CA and BA were negatively correlated with fluid cognition, their difference (BAI) was not associated with fluid cognition. The different results for BAI and SCI are in alignment with previous findings that relate distinct brain regions for the two cognition types. When evaluating the effects of different white-matter tracts on fluid and crystallized cognition, one study linked the forceps minor tract with measures of crystallized cognition and the superior longitudinal fasciculus with measures of fluid cognition 23 . The lack of correlation between SCI and crystallized cognition may have arisen because the features computed did not capture predictive information about crystallized cognition or the choice of model was inadequate for this task due to possible non-linear relationships between sleep features and crystallized cognition.
Including non-EEG metrics of health as features of a cognition index could potentially improve correlations between sleep metrics and cognition. For example, one study that predicted individual sleep metrics using age, cognitive scores, status of cardiometabolic disease, and baseline covariates found that individuals who performed above average within their age group exhibited sleep metrics closer to younger and healthier individuals 38 . Most influential features across SCI models. When reviewing the top contributors to significant SCI models, we found that a higher delta-to-theta ratio in N3 was important for total cognition, and a higher deltato-alpha ratio in N3 was influential for both total cognition and working memory. This finding is in line with  www.nature.com/scientificreports/ previous studies that show decreased delta band power during sleep for older adults with and without sleep disorders 39,40 and increased delta band power during sleep in response to a motor learning task 41 . Another N3 feature, line length, significantly contributed to total cognition, fluid cognition, cognitive flexibility, processing speed, and episodic memory models. Line length, also referred to as the mean resultant vector length, is the total variation in the signal amplitude and frequency and is a measure of EEG signal complexity. In our models, a larger signal complexity led to stronger correlations with cognition. This finding was likely driven by the line length of slow oscillations-associated spindles and delta-associated spindles during slow-wave sleep 42 .
Kurtosis of band power was also a strong feature for most models. Kurtosis is a measure of the amount of transiently occurring events, and a larger kurtosis corresponds with a more heavy-tailed distribution. For example, many transient 1-s spindles in a 30-s epoch can lead to higher kurtosis in the sigma band (11)(12)(13)(14)(15). In this study, we found that the tail extremity of alpha band power signals near the onset of sleep and during sleep-wake periods contributed to higher correlations with fluid cognition, working memory, and inhibitory control and visual attention scores. Meanwhile, the tail extremity of theta band power signals during N2 contributed to higher correlations with fluid cognition and four fluid functional constructs, excluding episodic memory. For total cognition, the tail extremity of delta band power signals during REM sleep likewise contributed to higher correlations.
With respect to spindle and slow oscillation features, spindle density during N2 was one of the top three contributors to the composite fluid cognition, inhibitory control and visual attention, cognitive flexibility, and processing speed models. This finding aligns with previous literature that links spindle density with different measures of fluid cognition and functional constructs 43,44 . Spindle amplitude, duration or frequency did not appear to be important. Further, the number of spindles that overlapped with a detected slow oscillation in N2 was an important feature for total cognition and three fluid functional constructs: cognitive flexibility, processing speed, and episodic memory. Coupling between the phase of slow-wave oscillations and spindle activity has been shown to facilitate memory consolidation and performance 45 and influence cognitive impairment in older adults 46 . These studies further support our episodic memory model, for which the second most influential feature was the circular mean of slow oscillation phase at spindle peak, or mean coupling direction. We also discovered that slow oscillation peak duration predicts cognitive flexibility. Slow oscillation slope, which has been linked to the effectiveness of neuronal synchronization at the cortical level 47 , was also found to predict episodic memory.
Among the sleep architecture features (macrostructure), the percentage of REM sleep was the only highly influential one and ranked the third important feature for working memory. This result agrees with previous studies that support the role of REM duration in working memory performance 48,49 . While REM occurs in the middle of the night, other macrostructures are more likely to be effected by the fact that the MGH dataset is a clinical dataset. For example, for total sleep time (TST), multiple studies have shown a U-shape relationship [50][51][52][53] where overly long or short TST is associated with worse cognition and TST between 6 and 8 h is associated with better cognition. However, the sleep architecture in the sleep lab may not reflect their habitual sleep given that participants are awoken around 6am and experience the "first-night effect" associated with sleep studies. This also affects wake after sleep onset, sleep efficiency, total time in bed, sleep latency and REM latency.

Goodness of fit.
A goodness of fit test showed that SCI models for all cognitive composite scores and all but three subtest scores (picture vocabulary, working memory, and episodic memory) were not improved by adding age and sex features. This suggests that SCI models capture changes in neurocognitive health related to age and sex via features of brain activity during sleep. EEG electrodes used in the SCI models. As shown in Table 3, the SCI model trained using central EEG electrodes performs best. This could be explained by the top features: the delta band power during N3 is highest at the central location, therefore the delta-to-theta power ratio during N3 at the central location is the most predictable. Similarly, the spindles at the central electrodes are the so-called fast spindles, which have been shown to correlate with cognition more strongly than slow spindles at frontal electrodes 54 .
External validation. We investigated whether SCI designed for the three proposed measures of cognition were indicative of performance on the WMS in the SHHS/FHS dataset and found significant correlations between SCI for fluid and total cognition and WMS scores. Both models resulted in comparable correlations, while the crystallized model had no correlation. Compared to our MGH dataset, the overall performance for total and fluid cognition SCI models was reduced in the SHHS/FHS dataset. This difference in performance is likely driven by differences in methodologies, such as the use of different neuropsychological batteries (the WMS selected does not include specific measures of processing speed or working memory), the larger gap between neuropsychological and polysomnography exam dates (SHHS/FHS ≤ 1095 days; MGH ≤ 40 days), and the difference in the average age of the two cohorts (SHHS/FHS: 62 years; MGH: 49 years). Sex was evenly divided for both cohorts, while the level of education could not be compared due to different methods of capturing education levels.
Limitations. Our study has several limitations, one of which is selection bias. As the study was offered only to those undergoing a PSG for suspected sleep disorders, participants likely had at least subjectively abnormal sleep. In addition, we did not control for medications. Therefore, the cohort would not be reflective of a healthy population. Participants also lacked racial (76% White) and socioeconomic diversity. As the single in-lab PSG setting is known to create the first-night effect, sleep for some participants may not represent typical sleep at home. Lastly, noise in the cognition scores may exist, as we did not control for the time of day when administrating the cognitive test battery or the time between PSG and cognitive assessment. www.nature.com/scientificreports/ Future directions. In future research, the night-to-night variability of SCI should be considered, as our previous work shows that the average night-to-night standard deviation in calculating BAI is 7.5 years, which can be reduced to less than 5 years by averaging consecutive nights 55 . Because calculating SCI only requires two central electrodes, information on night-to-night variability can be conveniently captured using home-based EEG recording devices to improve the reliability of SCI measurements. Additionally, although BA is commonly measured using magnetic resonance imaging (MRI) 56 , structural MRI scans remain costly, inaccessible to claustrophobic patients and those with metal implants, difficult to deploy or repeat, and do not measure functional status. Thus, sleep EEG-based brain age and health biomarkers may address some of these concerns due to the cost-effectiveness of EEG devices, the accessibility of home-based EEG recording devices, and the aging-associated changes in sleep EEG 57 . To understand the potential benefit in clinical settings, future work is needed to evaluate this biomarker in a more diverse population with cognitive impairment with underlying neuropathologic changes.

Conclusion
Sleep cognitive indices (SCI) are correlated with measures of total and fluid cognition, while the brain age index (BAI) is correlated with measures of crystallized cognition. Key features contributing to the observed relationships include delta-to-theta and delta-to-alpha band power ratios, kurtosis, spindle density, coupling between slow oscillations and spindles, and percentage of REM sleep. Further research is needed to improve the stability of SCI and to validate SCI as a brain health biomarker.

Data availability
The MGH dataset is available from the corresponding author upon reasonable request. The SHHS dataset is available from https:// sleep data. org/ datas ets/ shhs. The cognitive test results from the FHS (overlapping with SHHS) are available from https:// www. frami ngham heart study. org/. www.nature.com/scientificreports/